On generating near-optimal tableaux for conditional functional dependencies

نویسندگان

  • Lukasz Golab
  • Howard J. Karloff
  • Flip Korn
  • Divesh Srivastava
  • Bei Yu
چکیده

Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify data inconsistencies. A CFD augments a functional dependency (FD) with a pattern tableau that defines the context (i.e., the subset of tuples) in which the underlying FD holds. While many aspects of CFDs have been studied, including static analysis and detecting and repairing violations, there has not been prior work on generating pattern tableaux, which is critical to realize the full potential of CFDs. This paper is the first to formally characterize a “good” pattern tableau, based on naturally desirable properties of support, confidence and parsimony. We show that the problem of generating an optimal tableau for a given FD is NP-complete but can be approximated in polynomial time via a greedy algorithm. For large data sets, we propose an “on-demand” algorithm providing the same approximation bound, that outperforms the basic greedy algorithm in running time by an order of magnitude. For ordered attributes, we propose the range tableau as a generalization of a pattern tableau, which can achieve even more parsimony. The effectiveness and efficiency of our techniques are experimentally demonstrated on real data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unified Hierarchy for Functional Dependencies, Conditional Functional Dependencies and Association Rules

Conditional Functional Dependencies (CFDs) are Functional Dependencies (FDs) that hold on a fragment relation of the original relation. In this paper, we show the hierarchy between FDs, CFDs and Association Rules (ARs): FDs are the union of CFDs while CFDs are the union of ARs. We also show the link between Approximate Functional Dependencies (AFDs) and approximate ARs. In this paper, we show t...

متن کامل

Testing Implication of Probabilistic Dependencies

Axiomatization has been widely used for test­ ing logical implications. This paper suggests a non-axiomatic method, the chase, to test if a new dependency follows from a given set of probabilistic dependencies. Although the chase computation may require exponential time in some cases, this technique is a pow­ erful tool for establishing nontrivial theoreti­ cal results. More importantly, this a...

متن کامل

Comparison of Conditional Functional Dependencies using Fast CFD and CTANE Algorithms

Conditional Functional Dependencies (CFDs) are an extension of Functional Dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. However, finding CFDs is an expensive process that involves intensive manual effort. To effectively identify data cleaning rules, we take 4 techniques for cleaning the data from sample relati...

متن کامل

The Theory of Functional and Subset Dependencies Over Relational Expressions

A formal system for reasoning about functional dependencies (FDs) and subset dependencies (SDS) defined over relational expressions is described. An FD e: X +Y indicates that Y is functionally dependent on X in the relation denoted by expression e; an SD e c f indicates that the relation denoted by e is a subset of that denoted by f. The system is shown to be sound and complete by resorting to ...

متن کامل

Automated Reasoning to Infer all Minimal Keys

Wastl introduced for first time a tableaux-like method based on an inference system for deriving all minimal keys from a relational schema. He introduced two inference rules and built an automated method over them. In this work we tackle the key finding problem with a tableaux method, but we will use two inference rules inspired by the Simplification Logic for Functional Dependencies. Wastl’s m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2008